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ABSTRACT 

This paper presents a general framework for generating greedy algo¬ 
rithms for solving convex constraint satisfaction problems for sparse 
solutions by mapping the satisfaction problem into one of graph 
traversal on a rooted tree of unknown topology. For every pre-walk 
of the tree an initial set of generally dense feasible solutions is pro¬ 
cessed in such a way that the sparsity of each solution increases with 
each generation unveiled. The specific computation performed at 
any particular child node is shown to correspond to an embedding of 
a poly tope into the poly tope received from that nodes parent. Several 
issues related to pre-walk order selection, computational complexity 
and tractability, and the use of heuristic and/or side information is 
discussed. An example of a single-path, depth-first algorithm on a 
tree with randomized vertex reduction and a run-time path selection 
algorithm is presented in the context of sparse lowpass filter design. 

Index Terms — sparse constraint satisfaction, tree-search, incre¬ 
mental refinement, sparse filter design 

1. INTRODUCTION 

Sparsity principles, in a broad sense, have been and continue to be 
associated with desirable properties spanning a wide range of dis¬ 
ciplines including sampling theory, model-order reduction, the de¬ 
sign of power efficient systems, architectures for deep learning, etc. 
From the compressive sensing paradigm to the design of sparse fil¬ 
ters, solving a convex constraint satisfaction problem for a maxi¬ 
mally sparse solution is computationally difficult in general m This 
paper focuses attention to the class of sparse constraint satisfaction 
problems which are readily described using a non-convex optimiza¬ 
tion problem of the form 

x G argmin ||x||o (1) 

x£S 

where || • ||o indicates the number of non-zero coordinates in its argu¬ 
ment and S is a generally convex set. Conventional sparse recovery 
methods used in compressive sampling and elsewhere can be broadly 
categorized into one of two classes: convex optimization algorithms 
and greedy iterative methods. When S satisfies particular constraint 
qualification conditions, e.g., the restricted isometry property, Eq.Q] 
and a convex relaxation of the objective function to the 1-norm have 
been shown to produce identical solutions (2). The verification of 
such qualifications, however, is often itself a computationally ex¬ 
pensive or intractable task. 

Understanding the conditions under which various optimality 
guarantees can be made as well as the gracefulness with which such 
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guarantees degrade when the conditions are violated but almost satis¬ 
fied has been the focus of much attention in the literature, especially 
surrounding the performance of greedy methods. In fact, greedy 
methods often provably result in a maximally sparse solution for 
particular classes of problems (3). A well-known strategy involves 
incrementally decreasing the sparsity of a single solution in accor¬ 
dance with a suitable error metric until a particular stopping criterion 
is met. For example, (orthogonal) matching pursuit is a method of 
this type (4). Another strategy, encompassing methods such as Iter¬ 
ative Hard Thresholding (IHT), Subspace Pursuit, and Compressive 
Sampling Matching Pursuit (CoSaMP), iteratively thresholds dense 
solutions to a predetermined sparsity level while simultaneously at¬ 
tempting to decrease an error metric EHE). The framework pre¬ 
sented in this paper fundamentally differs from these strategies in 
that the sparsity of many solutions is monotonically increased via 
tree-based processing while feasibility with respect to Eq.Q]is main¬ 
tained at every stage. In addition, advanced knowledge or a priori 
assumptions on optimal sparsity levels or patterns is not required. 

Several problems of broad interest to the signal processing com¬ 
munity and elsewhere do not meet the constraint qualifications of 
compressive sensing, an example of which is presented in Section 3 
in the context of sparse lowpass filter design. Specialized algorithms 
that make use of heuristics and/or side information are often de¬ 
signed and lead to sufficiently sparse solutions for these problems. 
The class of algorithms proposed in this paper does not require such 
constraint qualifications and is therefore of use when such qualifica¬ 
tions either fail or cannot be verified. Further, any side information 
or heuristic knowledge of the sparsity pattern of an optimal solution 
is easily incorporated into an algorithm of this framework with only 
minimal adjustment. 

In Section 2 the general framework for generating greedy tree- 
search algorithms which solve problems of the general class de¬ 
scribed by Eq. Q] is presented. In this presentation, a number of 
computational tools each node must be equipped with are described 
and example routines are provided for each. We conclude this pre¬ 
sentation with a discussion emphasizing the consequences of several 
critical design decisions. Finally, in Section 3, an example algorithm 
adhering to the general framework is presented and evaluated. 

1.1. Notation and nomenclature 

Vectors will be denoted using an underscore and vector superscripts 
will be used to index vectors as opposed to indexing vector elements, 
i.e. x[ a) and represent distinct coordinates of the same vector. 
For simplicity of exposition we proceed assuming all vectors are in 
R N , extensions to complex-valued and/or N± x N 2 -dimensional vec¬ 
tor spaces follow readily. In order to avoid confusion we henceforth 
restrict the word node to only mean a graph vertex and reserve the 
use of the term vertex to that of a polytope. 



2. UNVEILING THE TREE 

The general strategy underlying the framework described in this sec¬ 
tion is to convert the convex constraint satisfaction problem in Eq. Q] 
to a problem of graph traversal on a rooted tree. Every node then 
processes a set of elements belonging to S, either received from ini¬ 
tialization or a parent node, in order to produce a new set of elements 
which also belong to S but have fewer non-zero coordinates. Using 
this set of elements the node unveils if it has children of its own; if 
so it may pass its set of solutions to one or more of its children and 
the process repeats. The topology of the tree directly relates to the 
sparsity of feasible solutions in two ways: (i) the number of non-zero 
coordinates in the set of feasible solutions decreases with each suc¬ 
cessive generation of the tree and (ii) siblings of each group possess 
feasible elements with distinct sparsity patterns. 

2.1. Initializing the root node 

We begin by equipping the root node with a set V of M possibly 
distinct elements drawn from the feasible set S in Eq. |T] i.e. 

V = jx w : x (i) e S, 1 < i < M j (2) 

where each is generally dense. Note that V is the vertex repre¬ 
sentation of a particular poly tope embedded within <S, i.e. the convex 
hull of the elements of V forms a closed convex polytope contained 
within S. For problems where S is defined or naturally described 
using half-space representation, i.e. as a finite collection of linear 
equality and inequality constraints, standard vertex enumeration al¬ 
gorithms may be directly used to populate vm. Alternatively, the 
use of convex programs with random or systematically chosen objec¬ 
tive functions can be used to obtain either well-spread vertices of S 
or elements with other desirable properties. As will become evident 
shortly, any greedy algorithm adhering to the proposed framework 
will not be able to obtain an optimal solution to Eq. |T| which is con¬ 
tained in S but not in the convex hull of £0. 

A pertinent question at this stage involes the selection of the de¬ 
sign parameter M. To address this issue we refer to (9) in which a 
cyclic polytope in an iV-dimensional space with v vertices is shown 
to achieve the maximum number of obtainable facets k. Therefore, 
the corresponding dual polytope maximizes the number of vertices 
for a fixed number of facets where v = 0{k L^"J ). This result ren¬ 
ders complete vertex enumeration of S computationally intractable 
in practice without imposing additional structure. We defer further 
comments on computational complexity to Section 2.2.3. 

2.2. Incremental sparsity: computation at each node 

Each node of the tree, after having received a convex polytope V in 
vertex representation from its parent node (or initialization for the 
root), is equipped with three primary functions: (i) UNVEIL CHIL¬ 
DREN, (ii) VANISHCOORDINATE, and (iii) REDUCECOMPLEXITY. 
In what follows we say that a node vanishes coordinate d when the 
result of its processing produces a new polytope V' such that 

x d = 0, VxG V'. (3) 

Further, each node of the tree satisfies the following property: for ev¬ 
ery x G V' , x k = 0 for all coordinates k vanished along the unique 
path to the root. In addition, every node in generation g contains a 
set of feasible solutions x to Eq.|T] which satisfy \\x\\q < N — g. 

'This discludes algorithms which successfully transform leaves into par¬ 
ents by drawing additional elements from S, as described in Section 2.2.1. 


Pseudocode 1 A subroutine which determines the set X of potential 
children for a given set of feasible solutions V. 
function unveilChildrenOP) 

X <— jd: 3 x^ + \ x^ G V s.t. x^ > 0 andx^ < 0 j 

end function 


Pseudocode 2 A subroutine which populates V' using elements of 
V by vanishing coordinate d using Eq.[4]for £ = 2. 
function VANISHCOORDINATE(7 :> ,d) 
for each x^ G V with x^ > 0 do 
for each x ^ G V with x^ < 0 do 



end for 
end for 
end function 


2.2.1. unveilChildren 

A fundamental ability required of each node in order to unveil the 
tree is to identify which, if any, offspring said node can produce. 
Stated another way, each node needs to be equipped with a subrou¬ 
tine which identifies the set of coordinates it can vanish given the 
polytope V received from its parent node. A sufficient condition for 
a particular node to vanish coordinate d requires its received poly¬ 
tope V to contain at least one element x^ such that ^ +) > 0 and 
at least one element x ^ such that £ d ] < 0. When this condition is 
satisfied, we refer to the potential offspring as child d. Let X denote 
the set of all such possible children for a given node, then that node 
may produce a maximum of \X\ children. Pseudocode Q] describes 
the subroutine unveilChildren which generates X for a given V. 
A node is then classified as a leaf of the tree provided that it cannot 
generate any children, i.e. \X\ — 0. 

Termination criteria for algorithms adhering to the framework 
proposed in this paper include, but are not limited to, identifying ei¬ 
ther the longest or a sufficiently long path from the root to a leaf. 
Subtree exploration, i.e. transforming a leaf node into a parent, is 
sometimes achievable by drawing additional elements from S. For 
example, by drawing elements using a convex program where the 
objective function is designed to target the sign of a specific coordi¬ 
nate while simultaneously imposing additional constraints to vanish 
all coordinates along the direct path to the root. 

2.2.2. VANISHCOORDINATE 

Consider a node having received polytope V and fix some d G X 
which we wish to vanish in the sense of Eq. [3] The general strategy 
by which the set V' is populated is presented next. In order to ensure 
that elements of V' maintain feasibility with respect to Eq. Q] and 
simultaneously satisfy Eq. [3j we make explicit use of the properties 
of a convex combination. Specifically, let x^\ • • • , x^ G V, then 

£ £ 

x' = ajX^ G V for = 1,0;* > 0. (4) 

i= 1 i= 1 

It is straightforward to then show that given the qualification for d to 
be an element of X , there exists a set of linear combination weights 
ai,..., ol£ satisfying Eq.0|such that x' d = 0. Any such combination 
for which this holds may then be used to systematically generate an 
element of V'. 












Fig. 1. An example depicting two generations of a partially unveiled 
tree (left) and a cross-section of the polytope embeddings (right) cor¬ 
responding to the highlighted walk from the root node to child 6 to 
child 2. 


Pseudocode 3 A function which reduces the computational com¬ 
plexity required to vanishing coordinate d using VANISHCOORDI- 
NATE._ 

function REDUCECOMPLEXITY^d) 

V' at most M d +>> unique vertices from V with x d > 0 
V' <— at most M d ~^ unique vertices from V with x d < 0 
end function 


Pseudocode 4 An example of a single-path pre-walk coordinate se- 
lection rule for runtime execution._ 

function selectCoordinate(;P,X) 

d! <- arg max|{x G V : x d > 0}| • \{x G V : x d < 0}| 

end function 


Pseudocode [2| contains the subroutine vanishCoordinate 
which generates this polytope using Eq.|4]with i — 2 and a proper 
selection of the weights and a 2 . In addition, Eq.|4] implies both 
that the coordinates previously vanished in generating V remain van¬ 
ished during the population of V' and that since x^ ^ x^ ^ 0 
then V' C V, i.e. V' is embedded within V. 

Depicted on the left of Figure 1 is two generations of a partially 
unveiled tree. Let V,Vi, and V 2 denote the polytopes on the high¬ 
lighted path belonging to nodes “root node”, x ^ (child 6), and x^ 
(child 2), respectively. A cross-section of the polytope embedding 
V 2 C Vi C V C S is depicted on the right. Also depicted is a 
different walk, i.e. “root node” to x ^ (child 2) to x^ (child 6), 
ending with a set of vertices belonging to a different group of sib¬ 
lings demonstrating another set of elements with the same sparsity 
pattern as the elements of V 2 . 

2.2.3. reduceComplexity 

The computational complexity required to compute all possible ele¬ 
ments of V' as described by the example routine VANISHCOORDI¬ 
NATE in Pseudocode 2 is generally intractable, especially for prob¬ 
lems with long root-to-leaf distances. Indeed, let V denote the re¬ 
ceived poly tope of a node and let M^ > 0 denote the number of 
elements x^V satisfying x d > 0 and M d ~^ > 0 the number of el¬ 
ements satisfying x d < 0 for some del Further, let V' denote the 
result of vanishing coordinate d using VANISHCOORDINATE. Then 
V' results in 

M' d = (5) 

possibly repeated vertices. It immediately follows that the total num¬ 
ber of vertices generated for a given node in generation g is on the 
order of G(M 2 ) where the root nodes polytope was populated us¬ 
ing M vertices. By limiting the number of vertices used to generate 
an embedded polytope with coordinate d vanished, the problem of 
tree traversal remains computationally tractable. This is explicitly at 
the expense of excluding regions of the poly tope V in forming V' 
where the maximally sparse solution may reside. 

A simple procedure to control the computational burden at each 
node is described in Pseudocode [3] where the function reduce¬ 
Complexity limits the number of vertices satisfying x d > 0 and 
x d < 0 to M d ^ < M d + ^ and M d ~^ < M d ~\ respectively. In 
our experience with several examples, the number of unique vertices 
generated by vanishCoordinate is smaller than M d , especially 
in later generations of the tree. This heuristic may help guide the 
dynamic selection of M^ and M d ~^ for a particular problem. 


2.3. Tree-search protocols 

A number of instrumental design decisions informed by the prob¬ 
lem at hand must be made when crafting a greedy algorithm of the 
general framework presented in this paper. For example, a critical 
decision involves selection of the tree traversal protocol which will 
be implemented, e.g., a depth-first or breadth-first search. This deci¬ 
sion is obfuscated by the fact that the trees topology is unknown at 
the outset. When considering a breadth-first search, it is important 
to understand that while the width of the tree w g at generation g, i.e. 
the total number of nodes across generation g , is upper bounded by 

Wg — Wg-1 (N - g 3-1) (6) 

where wq = 1 , the details of the problem at hand may signifi¬ 
cantly restrict the width for a number of reasons resulting in breadth- 
first protocols which are much more computationally attractive than 
those implied by Eq. [6] 

In considering depth-first protocols, the selection of either a run¬ 
time order selection algorithm or a predetermined order may depend 
critically upon the availability of heuristics or side information. For 
example, using the indices of a magnitude sorted 1-norm relaxation 
of Eq.[l]or any other available side information may result in solu¬ 
tions of higher sparsity or earlier identification of deep leaves. Pseu¬ 
docode 4 presents an example run-time elimination order subrou¬ 
tine SELECTCOORDINATE for a depth-first protocol which uses no 
heuristic or side information but generally aims to select coordinates 
to vanish which maximize the number of nodes in the embedded 
polytope V' produced using VANISHCOORDINATE. 

2.4. Subtree exploration 

Techniques which are agnostic to the details of the particular prob¬ 
lem at hand but attempt to restart the tree upon discovery of a leaf, or 
even more generally determine if there are children other than those 
identified using UNVEILCHILDREN on a particular nodes polytope 
during runtime, can easily be incorporated into the design of an algo¬ 
rithm to allow further exploration of the trees topology. We refer to 
such a routine as a subtree exploration routine. One such technique, 
as mentioned previously in Section 2.1.1, is to attempt to transform 
leaves into parent nodes and depends directly upon the computa¬ 
tional ease at which additional elements of S may be drawn with the 
addition of appropriately vanished coordinates. The class of algo¬ 
rithms as described until now is easily modified such that these types 
of subtree exploration attempts can be made at either the discovery 
of a leaf node or after no node in the unveiled tree has unexplored 
children. 













Another subtree exploration technique, which may be used to 
both trim branches and identify unexplored children, involves end¬ 
ing the exploration down a given node a having received polytope 
V a in a fixed generation g for which another node b in the same 
generation has received polytope Vb with elements consisting of the 
same sparsity pattern. Denote the union of the vectors in the two 
polytopes as V a u b, i.e. 

VaUb = {x | X e Va or X G Vb} (7) 

and assign this polytope to node b. Then the algorithm continues 
from node b and exploration is discontinued through node a. This 
technique is equivalent to searching for potential children nodes in¬ 
side the convex hull of the two poly topes. 

3. NUMERICAL EXAMPLE 

Maximally sparse filters, despite being a computationally difficult 
design problem, result in systems which have demonstrable advan¬ 
tages as compared to dense systems with respect to a number of prac¬ 
tical metrics ED. Although sparse filter design formulated in half¬ 
space representation does meet the constraint qualifications of the 
compressive sensing framework, many well-known existing design 
methods have a similar flavor to those used in compressive sens¬ 
ing, i.e. they are broadly classified as greedy iterative algorithms 
which make use of, e.g., weighted 1-norm linear programs GD a 
fair comparison with methods such as IHT and CoSaMP cannot be 
made in part due to various assumptions being violated. Further ap¬ 
proaches, such as non-convex programs, have been used to produce 
filters with very sparse impulse responses fl2l . The use of alterna¬ 
tive convexity principles has also been previously applied to filter 
design, e.g., in rm an initial design is iteratively projected between 
two convex sets resulting in a final design satisfying the constraints 
of both sets but is not explicitly optimal with respect to any metric. 

In this section we generate an example algorithm adhering to 
the general framework proposed in Section 2 and apply it toward 
the design of a sparse, causal, Type I linear-phase finite-impulse- 
response lowpass filter h[n\ with support [0, 21V]. In particular, let 
Q p b and Q s b denote the respective passband and stopband where 


pb — \j-^k • k E 

[ < d p b , OJpb] } 

(8) 

SSS Ldk G [ 7T, 

~UJ sb ] U [Uj sb , 7r]} 

(9) 


for 0 < ujpb < w s b < 7T and 1 < k < K where K is chosen 
to sufficiently sample the frequency axis with respect to the filter 
support. We impose frequency-domain attenuation constraints such 
that a candidate impulse response is said to be feasible if its Fourier 
transform amplitude deviates no more than S p b and 5 s b from the ideal 
amplitude response over the passband and stopband, respectively. 
Let the ideal amplitude response, denoted by D(cj), be unity on Q p b 
and zero on Then, using the notation in Eq. |TJ the feasible set 
S is written as 

S = {x€K JV : \T(u,x) - D(u)\ < 5 pb ,u € Q pb , (10) 
|T (uj,x) - D (w)| < 6 sb ,u! € O si) } 

where 

N 

T(ce, x) = ^^x k cos (uo (k — 1)) (11) 

k =i 

and x % — h[ 0] and x k = 2h[k — 1] for 2 < k < N. The design 
example as formulated in this section is easily extended to describe 


Algorithm 1 A single-path, depth-first algorithm with randomized 
vertex reduction and a run-time order selection subroutine._ 

1 4- unveilChildren('P) 

while 1 ^ 0 do 

d <— selectCoordinate(P,Z) 

V <- reduceComplexityOP,^) 

V vanishCoordinateOP,gO 
X <- unveilChildren('P) 

end while 



Fig. 2. An impulse response corresponding to one sparse solution 
generated using Algorithm 1. Zero valued coefficients are marked 
with red x’s. 


other classes of filters possibly including additional constraints, e.g., 
in fl4l a number of common filter constraints are formulated as 
closed convex sets. The design specifications, similar to those found 
in an example in noi . are chosen as follows: passband cutoff fre¬ 
quency uj p b — 0.207T, stopband cutoff frequency uj s b = 0.257T, pass- 
band ripple S p b = 0.01 (linear), stopband ripple S s b = 0.1 (linear), 
and support parameter N = 31. 

Algorithm 1 describes a generic single-path depth-first algo¬ 
rithm using the example run-time order selection subroutine SE- 
lectCoordinate from Pseudocode 4. In order to apply Algo¬ 
rithm 1 to the sparse filter design problem, the root nodes initial 
polytope V is populated using M — 500 vertices drawn from S. 
The vertices are in particular selected by solving a sequence of linear 
programs where the coefficients of the objective vector are chosen 
uniformly in [—1,1]. In order to retain tractability, the subroutine 
reduceComplexity is used with M (+) ,M (-) < 500 at each 
generation, i.e. the polytope V cannot exceed 250, 000 vertices at 
any given generation. The algorithm, as described, does not attempt 
subtree exploration, i.e. no attempt is made to draw further samples 
in order to transform a leaf into a parent and thus the termination 
criterion is the discovery of a leaf. A randomly selected element of 
the leaf nodes polytope, transformed into an impulse response h[n], 
is depicted in Figure 2. The length of the root-to-leaf walk for this 
example is 15 and is directly related to the sparsity of the obtained 
impulse response. 

Variations and extensions to Algorithm 1 follow immediately, 
such as utilizing alternative order selection rules as a function of the 
current sparsity level or pattern. For example, a natural alternative 
is to select a fixed ordering prior to runtime corresponding to the 
indices of the magnitude-sorted coefficients of the solution to the 1- 
norm relaxation of Eq.Q] In comparing these two path selection rules 
for a number of examples, the order selection rule in Pseudocode 4 
tends to result in sparser impulse responses. This may in part be un¬ 
derstood by the fact that thresholding the 1-norm solution generally 
does not degrade gracefully from the frequency-domain constraints. 
Using subtree extensions and other variations generally resulted in 
impulse responses of different sparsity levels and patterns. 
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